Recognition of writer-independent off-line handwritten Arabic (Indian) numerals using hidden Markov models

نویسنده

  • Sabri A. Mahmoud
چکیده

This paper describes a technique for the recognition of optical off-line handwritten Arabic (Indian) numerals using hidden Markov models (HMM). The success of HMM in speech recognition encouraged researchers to apply it to text recognition. In this work we did not follow the general trend of using sliding windows in the direction of the writing line to generate features. Instead we generated features based on the digit as a unit. Angle-, distance-, horizontal-, and verticalspan features are extracted from Arabic (Indian) numerals and used in training and testing the HMM. These features proved to be simple and effective. In addition to the HMM the nearest neighbor classifier is used. The results of both classifiers are then compared. Several experiments were conducted for estimating the suitable number of states for the HMM. The best results were achieved with an HMM model with 10 states. In addition, we experimented with different number of features. The best results were achieved with 120 feature vector representing a digit. A database of 44 writers, each writer wrote 48 samples of each digit resulting in a database of 21,120 samples. The data were size normalized to enable the technique to be size invariant. In extracting the features the center of gravity of the digit is used to make the technique translation invariant. The randomization technique was used to generate Arabic (Indian) numbers for training and testing the HMM classifier. The randomization was done on the number of digits per number and on the digit sequence. About 2171 Arabic (Indian) numbers were generated, totaling 21,120 digits. 1700 numbers (totaling 16,657 digits) were used in training the HMM and 471 numbers (totaling 4463 digits) are used in testing the HMM. The samples of the first 24 writers were used in training the nearest neighbor classifier and the remaining 20 writers’ samples were used in testing. The achieved average recognition rates are 97.99% and 94.35% using the HMM and the nearest neighbor classifiers, respectively. The classification errors were analyzed and it was clear that some errors may be attributed to bad data, some to deformation and unbalanced proportion of digit segments, different writing styles of some digits, errors between digit pairs were specified and analyzed, and genuine errors. It was clear that the real misclassification of genuine data, in the case of HMM was nearly 1%. This proves the effectiveness of the presented technique to writerindependent off-line Arabic (Indian) handwritten digit recognition. The technique is writer independent as separate writers’ data were used in training of the classifiers and other writers’ data were used in the testing phase. r 2007 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recognition of Handwritten Arabic (Indian) Numerals using Radon- Fourier-based Features

This paper describes a technique for the recognition of off-line handwritten Arabic (Indian) numerals using Radon-Fourier-based features. A two stage classification scheme is used. The Nearest Mean (NMC), K-Nearest Neighbor (K-NNC), and Hidden Markov Models (HMMC) Classifiers are used in the first stage and a Structural Classifier (SC) is used in the second stage. A database of 44 writers with ...

متن کامل

Automatic Recognition of Off-line Handwritten Arabic (Indian) Numerals Using Support Vector and Extreme Learning Machines

This paper describes a technique using Support Vector (SVM) and Extreme Learning Machines (ELM) for automatic recognition of off-line handwritten Arabic (Indian) numerals. The features of angle, distance, horizontal, and vertical span are extracted from these numerals. The database has 44 writers with 48 samples of each digit totaling 21120 samples. A two-stage exhaustive parameter estimation t...

متن کامل

The use of Radon Transform in Handwritten Arabic (Indian) Numerals Recognition

This paper describes a technique for the recognition of off-line handwritten Arabic (Indian) numerals using Radon and Fourier Transforms. Radon-Fourier-based features are used to represent Arabic digits. Nearest Mean Classifier (NMC), K-Nearest Neighbor Classifier (K-NNC), and Hidden Markov Models Classifier (HMMC) are used. Analysis using different number of projections, varying the number of ...

متن کامل

Off-line Arabic Handwritten Isolated Character Recognition using Hidden Markov Models

This paper presents a recognition system for Arabic handwritten isolated characters. The recognition system is based on hidden Markov model (HMM). The entire system is capable of recognizing the Arabic handwritten characters. First, the system removes all the variation in the character images. Second, Features are extracted using the sliding window technique with HMM. Then, the HMM is used for ...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Signal Processing

دوره 88  شماره 

صفحات  -

تاریخ انتشار 2008